Fault Tolerant Framework in MPI-based Distributed DEVS Simulation

نویسندگان

  • Bin Chen
  • Xiao-gang Qiu
چکیده

Distributed DEVS simulation plays an important role in solving complex problems for its reuseability, and composability of component models. Using MPI to be the communication middleware, the distribution increases the performance. But even the tiny faults of computing resources can lead to crash. Hence Fault Tolerant is necessary to maintain the simulation reliability. This paper introduces a DEVS framework supported Fault Tolerant. The optimistic distributed simulators implement the distribution in DEVS simulation. Fault Detection, States Storage and Fault Recovery are integrated into the framework to avoid crash at runtime. Experiments are carried out to find the optimal Timeout for Fault Tolerant framework. The results indicate that the framework has to be adjusted along with the changing of simulation requirements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards an MPI-like Framework for Azure Cloud Platform

Message passing interface (MPI) has been widely used for implementing parallel and distributed applications. The emergence of cloud computing offers a scalable, fault-tolerant, on-demand alternative to traditional on-premise clusters. In this thesis, we investigate the possibility of adopting the cloud platform as an alternative to conventional MPI-based solutions. We show that cloud platform c...

متن کامل

FT-MPI, Fault-Tolerant Metacomputing and Generic Name Services: A Case Study

There is a growing interest in deploying MPI over very large numbers of heterogenous, geographically distributed resources. FT-MPI provides the fault-tolerance necessary at this scale, but presents some issues when crossing multiple administrative domains. Using the H2O metacomputing framework, we add cross-administrative domain interoperability and pluggability to FT-MPI. The latter feature al...

متن کامل

A generalized ABFT technique using a fault tolerant neural network

In this paper we first show that standard BP algorithm cannot yeild to a uniform information distribution over the neural network architecture. A measure of sensitivity is defined to evaluate fault tolerance of neural network and then we show that the sensitivity of a link is closely related to the amount of information passes through it. Based on this assumption, we prove that the distribu...

متن کامل

Failure Resilient Heterogeneous Parallel Computing Across Multidomain Clusters

We propose lightweight middleware solutions that facilitate and simplify the execution of failure-resilient MPI programs across multidomain clusters. The system described in this paper leverages H2O, a distributed metacomputing framework, to route MPI message passing across heterogeneous aggregates located in different administrative or network domains. MPI programs instantiate a specially writ...

متن کامل

Towards Fault-tolerant HLA-based Distributed Simulations

Large scale High Level Architecture (HLA)-based simulations are built to study complex problems, and they often involve a large number of federates and vast computing resources. Simulation fed-erates running at different locations are subject to failure. The failure of one federate can lead to the crash of the overall simulation execution. Such risk increases with the scale of a distributed sim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009